Importing the data and preparing it so that it can be used for analysis

To move forward with the analysis, we need to make sure there are no NaN values in our data set

We see that there are a lot of NaN values. This is due to each station starting/ending their recording of data at different times, and some stations may have had a break in data collection at times.

We will interpolate the missing values.

The end goal of the pca analysis is to find the principal node components of varience of our data set. Using those we want to visualize spatial-temporal weather patterns.

Before proceeding we will do a "test run" to interpolate the data to a grid of the Greater Victoria peninsula.

Start PCA analysis

Idea:

The first component accounts for almost 92% of the observed variance in the data. While the second component is much much smaller (accounting for 0.01% of the variance), we will none the less include it in the analysis

Now we we can reduce the dimensionality of our data set, and project the data on in the direction of the highest correlation

Now we can interpolate the projected data on a grid of the region which will visualize the weather patterns in the region. Do to so, we will use W, the projection matrix. The columns of W are the principal components, i.e., orthoganal basis vectors. The first column is the direction with the greatest variance, the second column is the direction with the greatest variance w.r.t the first component (first col of W). We can view these columns as "weights". There are 39 rows in each col, one for each station, so each entry corresponds to how much of the variance the corresponding station contributes. Said more simply, a high value in W corresponds to that station having a lot of variance in its data, compared to a station with lower entry in W. So Temperatures will fluctuate more in the region of the station with greater weights (in W) than those with lower weights.

griddate from scipy forms a convex hull from input. In our case the our input coordinates are the station coord which form the shape seen in the plots above. The rest of the map (outside this convex hull) are just nan. This is normal since griddata interpolates INSIDE the convex hull.